Indexability of Restless Bandit Problems and Optimality of Index Policies for Dynamic Multichannel Access

نویسندگان

  • Keqin Liu
  • Qing Zhao
چکیده

We consider an opportunistic communication system consisting of multiple independent channels with time-varying states. With limited sensing, a user can only sense and access a subset of channels and accrue rewards determined by the states of the sensed channels. We formulate the problem of optimal sequential channel selection as a restless multi-armed bandit process. We establish the indexability and obtain Whittle’s index in closed-form for both discounted and average reward criteria. These results lead to the direct implementation of Whittle’s index policy with remarkably low complexity. When channels are stochastically identical, we show that Whittle’s index policy is optimal under certain conditions. Furthermore, it has a semi-universal structure that obviates the need to know the channel transition probabilities. The optimality and the semi-universal structure result from the equivalency between Whittle’s index policy and the myopic policy established in this work. For non-identical channels, we develop efficient algorithms for computing a performance upper bound resulting from Lagrangian relaxation. The tightness of the upper bound and the near-optimal performance of Whittle’s index policy are illustrated with simulation examples. Index Terms Opportunistic access, dynamic channel selection, restless multi-armed bandit, Whittle’s index, indexability, myopic policy. This work was supported by the Army Research Laboratory CTA on Communication and Networks under Grant DAAD1901-2-0011 and by the National Science Foundation under Grants ECS-0622200 and CCF-0830685. Part of this work was presented at the 5th IEEE Conference on Sensor, Mesh and Ad Hoc Communications and Networks (SECON) Workshops (June, 2008) and the IEEE Asilomar Conference on Signals, Systems, and Computers (October, 2008).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Index Policies for a Class of Discounted Restless Bandits

The paper concerns a class of discounted restless bandit problems which possess an indexability property. Conservation laws yield an expression for the reward suboptimality of a general policy. These results are utilised to study the closeness to optimality of an index policy for a special class of simple and natural dual speed restless bandits for which indexability is guaranteed. The strong p...

متن کامل

Restless Bandit Marginal Productivity Indices, Diminishing Returns, and Optimal Control of Make-to-Order/Make-to-Stock M/G/1 Queues

This paper presents a framework grounded on convex optimization and economics ideas to solve by index policies problems of optimal dynamic allocation of effort to a discrete-state (finite or countable) binary-action (work/rest) semi-Markov restless bandit project, elucidating issues raised by previous work. Its contributions include: (i) the concept of a restless bandit’s marginal productivity ...

متن کامل

Asymptotically optimal priority policies for indexable and non-indexable restless bandits

We study the asymptotic optimal control of multi-class restless bandits. A restless bandit isa controllable stochastic process whose state evolution depends on whether or not the bandit ismade active. Since finding the optimal control is typically intractable, we propose a class of prioritypolicies that are proved to be asymptotically optimal under a global attractor property an...

متن کامل

Asymptotic optimal control of multi-class restless bandits

We study the asymptotic optimal control of multi-class restless bandits. A restless bandit is acontrollable process whose state evolution depends on whether or not the bandit is made active. Theaim is to find a control that determines at each decision epoch which bandits to make active in orderto minimize the overall average cost associated to the states the bandits are in. Sinc...

متن کامل

A Restless Bandit Formulation of Multi-channel Opportunistic Access: Indexablity and Index Policy

We focus on an opportunistic communication system consisting of multiple independent channels with time-varying states. With limited sensing, a user can only sense and access a subset of channels and accrue rewards determined by the states of the sensed channels. We formulate the problem of optimal sequential channel selection as a restless multi-armed bandit process, for which a powerful index...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008